Grammatical Category Disambiguation by Statistical Optimization

نویسنده

  • Steven J. DeRose
چکیده

Several algorithms have been developed in the past that attempt to resolve categorial ambiguities in natural language text without recourse to syntactic or semantic level information. An innovative method (called "CLAWS") was recently developed by those working with the Lancaster -Oslo/Bergen Corpus of British English. This algorithm uses a systematic calculation based upon the probabilities of co-occurrence of particular tags. Its accuracy is high, but it is very slow, and it has been manually augmented in a number of ways. The effects upon accuracy of this manual augmentation are not individually known. The current paper presents an algorithm for disambiguation that is similar to CLAWS but that operates in linear rather than in exponential time and space, and which minimizes the unsystematic augments. Tests of the algorithm using the million words of the Brown Standard Corpus of English are reported; the overall accuracy is 96%. This algorithm can provide a fast and accurate front end to any parsing or natural language processing system for English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiobjective Genetic Programming for Natural Language Parsing and Tagging

Parsing and Tagging are very important tasks in Natural Language Processing. Parsing amounts to searching the correct combination of grammatical rules among those compatible with a given sentence. Tagging amounts to labeling each word in a sentence with its lexical category and, because many words belong to more than one lexical class, it turns out to be a disambiguation task. Because parsing a...

متن کامل

Explorations in Using Grammatical Dependencies for Contextual Phrase Translation Disambiguation

Recent research has shown the importance of using source context information to disambiguate source phrases in phrase-based Statistical Machine Translation. Although encouraging results have been obtained, those studies mostly focus on translating into a less inflected target language. In this article, we present an attempt at using source context information to translate from English into Fren...

متن کامل

Automatic interlinear glossing as two-level sequence classification

Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic gloss...

متن کامل

Automatic Construction and Global Optimization of a Multisentiment Lexicon

Manual annotation of sentiment lexicons costs too much labor and time, and it is also difficult to get accurate quantification of emotional intensity. Besides, the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons (Wang et al., 2010). This paper implements statistical training for large-scale Chinese corpus through neural network languag...

متن کامل

Part of Speechtagger for Kannada

Parts of speech tagging is a well-understood problem in NLP. The importance of the problem focuses from the fact that the Parts of Speech tagging is one of the first stages in the process performed by various natural language related process. POS tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. POS tagging has a cruci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 14  شماره 

صفحات  -

تاریخ انتشار 1988